Clustering Categorical Data by Utilizing the Correlated-Force Ensemble

نویسندگان

  • Kun-Ta Chuang
  • Ming-Syan Chen
چکیده

We explore in this paper a novel clustering algorithm, named CORE (standing for CORrelated-Force Ensemble), for categorical data. In general, it is more difficult to perform clustering on categorical data than on numerical data due to the absence of the ordered property in the former. Though several clustering algorithms which concentrate on categorical date were proposed, acquiring the desirable quality remains a challenging issue. Note that there is significance hidden in the correlation between attribute values that can be explored to aid clustering, especially extracting clusters in the high dimensional data. Therefore by employing the concept of correlated-force ensemble, clusters which consist of the highly correlated set of nominal attribute values, can be acquired by the proposed algorithm, CORE. As validated by variant real datasets, it is shown in our experimental results that algorithm CORE significantly outperforms the prior works.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Link-Based Cluster Collection Approach Combined Contagious Cluster With For Categorical Data Clustering

Data clustering is a challenging task in data mining technique. Various clustering algorithms are developed to cluster or categorize the datasets. Many algorithms are used to cluster the categorical data. Some algorithms cannot be directly applied for clustering of categorical data. Several attempts have been made to solve the problem of clustering categorical data via cluster ensembles. But th...

متن کامل

High-Dimensional Unsupervised Active Learning Method

In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...

متن کامل

ارائه یک الگوریتم خوشه بندی برای داده های دسته ای با ترکیب معیارها

Clustering is one of the main techniques in data mining. Clustering is a process that classifies data set into groups. In clustering, the data in a cluster are the closest to each other and the data in two different clusters have the most difference. Clustering algorithms are divided into two categories according to the type of data: Clustering algorithms for numerical data and clustering algor...

متن کامل

A Thorough Investigation of Link-Based Cluster Ensemble Approach for Data Clustering

Clustering, in data mining, is useful to discover distribution patterns in the underlying data. Clustering algorithms usually employ a distance metric based (e.g., Euclidean) similarity measure in order to partition the database such that data points in the same partition are more similar than points in different partitions. The problem of clustering becomes more challenging when the data is ca...

متن کامل

A new ensemble clustering method based on fuzzy cmeans clustering while maintaining diversity in ensemble

An ensemble clustering has been considered as one of the research approaches in data mining, pattern recognition, machine learning and artificial intelligence over the last decade. In clustering, the combination first produces several bases clustering, and then, for their aggregation, a function is used to create a final cluster that is as similar as possible to all the cluster bundles. The inp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004